After having cleaned and adapted the dataset to my wishes, I can start into my exploration.
For example, I can query how many individual contributions did each candidate receive, e.g. Hillary Clinton:
QUESTION 1: Who got the most money through contributions?
In this section I’ll take a look at money and amounts. I’m wondering which candidate got the most contributions, and is it the same person who got the most money through contributions? How is the size of the contributions distributed in regards to the candidates? Which gender received more contributions, or more money? Which political party?
PART 1: the mean and distributions

Constructing a frequency plot makes me see something about Sanders, Bernard (who has a big dot low down). Some candidates have a small distribution, and some a wider one, there is another person with a big dot at the bottom, yet also a quite big one higher up (i think this is Clinton, Hillary); and further one person with a very high distribution up and down, but also a strong base it seems (Cruz, Ted?).


Cleaned up the graph a bit and reordered ascending! This plot gives a nice overview of the mean contributions per candidate! Maybe it’s a keeper ; )

PART 2: amounts
Most of the contributions are below 1000 $!
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


PART 2.1: refunds
The previous graph gives me negative values for amounts, which is at first very confusing. So I construct a vector holding only the refunds.
## [1] 1498
Seems there are nearly 1500 “contributors” who actually got more money refunded than what they gave.
And some gave some, but also got some back (or got some back multiple times?):
## [1] 2102
So I tried a plot for these refunds (which was too big, so I’ll take a look at the most Dollars)

This is a lot of money to get back. I wonder why and how. But I must admit I don’t really understand these US politics and fundings of the candidates Here’s some info: https://ballotpedia.org/California_Proposition_34,_Limits_on_Campaign_Contributions_(2000)
PART 2.2: contributions
Well, so let’s see who gave the most:

## [1] 24
There are 24 people that all gave the same amount, so I suspect that there is an upper limit around 10.000$ (maybe without taxes, or such? They should be alread PACs?)
I also spotted potentially erroneous data, with the contributor DE GROOTE, DOUG MR. being listed, as well as a DE GROOTE, DOUG, both with the same amount, which makes me believe that it is a mistake in the data (or a rather badly executed way of increasing one’s contribution limit…)
Maybe can check :)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
## [23] TRUE TRUE
Yep, seems that some of those transactions are listed more often in the dataset. Actually quite a lot of them! 7 TRUE!
It seems that DE GROOTE could be also a company, because it’s listed also among the contbr_employer column. Well. But I’m not gonna go hunt down these individuals :)
PART 3: gender and money
Number of contributions per gender of candidate

Total amount of money through contributions per gender of candidate

Whoa! seems that females received nearly as much money in contributions as males did, even though there are only 3 female vs. 19 male candidates + there were way less fewer contributions to femal candidates than to male candidates!
PART 3.1: binning
It could be interesting to take a look at the mean contribution per F/M candidate.
So here are the summaries for the contributions for the female candidates, then for the male candidates, and finally for the whole dataset combined:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.0 25.0 100.0 501.7 250.0 5400.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10000.0 25.0 50.0 179.2 100.0 10800.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10000.0 25.0 50.0 262.4 100.0 10800.0
I’ll start with a generous binwidth of 1000:

This graphic is pretty useless - nearly every contribution falls within the first bin, there are very few that are above 1000. So i can adjust the binnning:

Whoa! Most contributions are actually between 0-100$!! So there are many many small contributions that were made in CA. Let’s look at this data in a table:
##
## (0,100] (100,200] (200,300] (300,400]
## 135435 7100 12498 493
## (400,500] (500,600] (600,700] (700,800]
## 5781 140 214 144
## (800,900] (900,1e+03] (1e+03,2e+03] (2e+03,3e+03]
## 47 4777 1448 9823
## (3e+03,1.1e+04]
## 445
Interesting, because I already saw that most low contributions went to one candidate: Sanders, Bernard.
PART 3.2: proportions
I’ll go forward displaying proportions of female/male candidates in the respective parties

The proportions on how many candidates of the respective gender are running in the presidential elections is very different between the three parties.
Let’s see a graph plotting the statistical percentage of money per candidate per gender

But of course that’s VERY misleading!… Clinton, Hillary got so much of the amounts contributed to female candidates, that we should make this more clear:

As we can see, Hillary Clinton received by far the largest amount of money through contributions. She nearly single-handedly takes half of all the contributions made.
PART 4: party money
I can also see, that Clinton and Sanders make the two biggest single sections, and they are both with the Democrats. Therefore it would be interesting to plot money per party instead of by gender.

Here we can see that the Democrats received way more money through contributitons than the Republicans did, and that Hillary Clinton alone received more money than all the Republican candidates combined. Whereas the contributions to the green party are so tiny, that they become invisible in this visualization.
PART 5: candidate money
Number of contributions per candidate:

##
## Bush, Jeb Carson, Benjamin S.
## 2762 21045
## Christie, Christopher J. Clinton, Hillary Rodham
## 316 42063
## Cruz, Rafael Edward 'Ted' Fiorina, Carly
## 21645 4426
## Graham, Lindsey O. Huckabee, Mike
## 331 447
## Jindal, Bobby Kasich, John R.
## 31 701
## Lessig, Lawrence O'Malley, Martin Joseph
## 372 383
## Pataki, George E. Paul, Rand
## 20 4117
## Perry, James R. (Rick) Rubio, Marco
## 116 7994
## Sanders, Bernard Santorum, Richard J.
## 72179 79
## Stein, Jill Trump, Donald J.
## 85 590
## Walker, Scott Webb, James Henry Jr.
## 670 106
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

While most are vanishingly small in comparison, four candidates stick out in terms of number of contributions.
Let’s look at the money amassed through the contributions:
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

These plots are not very interesting and don’t tell me too much new, so enough of this topic for now.
QUESTION 2: Which city contributed the most, and to whom?

The next one is actually quite interesting, because it shows how people often give full-number-amounts of money (with the distinct lines going vertical).

Here i’ll apply geom_jitter() only to the actual contributions, leaving out the refunds.
It nicely shows how much do most people donate (first big bar to the right of 0), about which cities had the most contributions (horizontal black lines), and where are there common discrete jumps, maybe related to regulations such as donation limits (vertical lines). This is a nice graph :)

Trying to remove cities and keep only the onese with high contribution amounts:

Hm… this is not so interesting, so maybe I should rather right away put it in relation with the number of inhabitants.
Getting Population estimates for CA
I found some here: https://www.census.gov/popest/data/cities/totals/2012/SUB-EST2012.html The data is for 2012, but it’s the most current one that is listing cities that I could find there. It’s not perfect, but for now I just want to take a look :)

Could be interesting to display this with a facet_wrap(). However, there are too many cities. So i should maybe try this instead:
- calculating the ratios for contributions/inhabitant
- subsetting into categories of lower quartile, mean, upper quartile
- showing one plot for maybe the mean of each group
Calculating the Ratios
Oops, there is also data for “counties” in here, that have now the same name as some of the cities… so I’ll have to remove the rows with the higher values for capita.
Question 3: How much and to whom did NOT EMPLOYED give
I am very perplexed and involved with the topic of homeless people in the US, so anything that goes into that direction rings a bell with me. Here I’m trying to investigate a little bit into the political direction that homeless might be having a tendency for.
However, I understand that this is highly hypothetical, because I only have data of monetary contributions, that “NOT EMPLOYED” people gave for the presidential campaigns. Of course giving money != political orientation (It might be a good proxy, however what I’m trying to say is, that it’s not an exhaustive factor. Many people might have a political orientation, however did not contribute to the campaigns monetarily. This might be especially true for homeless people, who are very likely to have very little money at their hands). Further, NOT EMPLOYED != homeless. There are quite a few people that are employed, but homeless in the US. Assuming that they would give a contribution, they would fall into a different category.
These graphs are not gonna show much about how much money was flowing, but rather is intended as a proxy on where does a certain section of society lean towards politically.
Who gave how much?

Okay, this is not so exciting :) Wait…
##
## employed not employed
## 159985 20446
There are very few NOT EMPLOYED compared to those with employment, so I’ll need ratios
Displaying proportions of which party did people with/without employment give contributions to:
## Warning: Removed 47 rows containing non-finite values (stat_sum).

Interesting: it seems that a much higher percentage of people without employment contribute to the Democrats.
NOTE: I boosted the max_size variable, in order to make it more clear how vanishingly small is the percentage of NOT EMPLOYED people that contributed to the Republican party.
So, let’s see for whom…
## Warning: Removed 47 rows containing non-finite values (stat_sum).

Sanders, Bernard gets percentage-wise the most contributions from the NOT EMPLOYED! And it seems that apart from Trump, Donald J. there is no Republican currently left in the ballots who got contributions by NOT EMPLOYED people.
When increasing the max_size also here it becomes even more clear how great the difference between Sanders, Bernard and the other candidates is in this aspect.
NOTE: NOT EMPLOYED probably also includes college students! Which makes sense because Sanders wants to take away college debt, so he has a big bunch of the students on his side (I heard of 83% somewhere).
Let’s look at this in another way:

This sounds like a pie chart, haha, also with the implications of who gets the biggest piece :)
Thinking about occupation-voter distribution, now I’m wondering which party do IT people lean towards. :)
So I was collecting all the unique jobs present in the data.frame.
## [1] 8958
Haha, found one row listing with contb_occupation : GRANDPA !! :)
But well, these are too many, and seems that people just put what they wanted. It hasn’t been scanned through and grouped it seems. I don’t want to get into this. So let’s rather wrap it up :)